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We study community structure of networks. We have developed a scheme for maximizing the modularity Q 
[ 1] based on mean field methods. Further, we have defined a simple family of random networks with community 
structure; we understand the behavior of these networks analytically. Using these networks, we show how the 
mean field methods display better performance than previously known deterministic methods for optimization 
of<2- 

PACS numbers: 



I. INTRODUCTION 

A theoretical foundation for understanding complex net- 
works has developed rapidly over the course of the past few 
years IE 0]. More recently, the subject of detecting net- 
work communities has gained an large amount of attention, 
for reviews see Refs |5|,|g]. Community structure describes 
the property of many networks that nodes divide into modules 
with dense connections between the members of each module 
and sparser connections between modules. 

In spite of a tremendous research effort, the mathematical 
tools developed to describe the structure of large complex net- 
works are continuously being refined and redefined. Essential 
features related to network structure and topology are not nec- 
essarily captured by traditional global features such as the av- 
erage degree, degree distribution, average path length, cluster- 
ing coefficient, etc. In order to understand complex networks, 
we need to develop new measures that capture these structural 
properties. Understanding community structures is an impor- 
tant step towards developing a range of tools that can provide 
a deeper and more systematic understanding of complex net- 
works. One important reason is that modules in networks can 
show quite heterogenic behavior |7|], that is, the link struc- 
ture of modules can vary significantly from module to module. 
For such heterogenic systems, global measures can be directly 
misleading. Also, in practical applications of network the- 
ory, knowledge of the community structure of a given network 
is important. Access to the modular structure of the internet 
could help search engines supply more relevant responses to 
queries on terms that belong to several distinct communities 1 . 
In biological networks, modules can correspond to functional 
units of some biological system Q8[] . 



II. THE MODULARITY 

This section is devoted to an analysis of the modularity Q. 
Identifying communities in a graph has a long history in math- 
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1 Some search engines have begun implementing related ideas, see for exam- 
ple Clusty, the Clustering Engine (http://clusty.eom/ l. There is, however, 
still considerable room for improvement. 



ematics and computer science @]. One obvious way to 
partition a graph into C communities is distribute nodes into 
the communities, such that the number of links connecting the 
different modules of the network is minimized. The minimal 
number of connecting links is called the cut size R of the net- 
work. 

Consider an unweighted and undirected graph with n nodes 
and m links. This network can be represented by an adjacency 
matrix A with elements 

^ _ J 1, if there is a link joining nodes i and j; 
lJ ) otherwise. 

This matrix is symmetric with 2m entries. The degree hi of 
node i is given by ki = Y,jAij. Let us express the cut-size in 
terms of A; we find that 

tf = ^I> 7 [l-8( Ci , c ,)], (2) 

L ij 

where c, is the community to which node i belongs and 
8(oc,P) = 1 if a = p and S(oc,P) = if a ^ p. Minimizing 
R is an integer programming problem that can be solved ex- 
actly in polynomial time [ 10]. The leading order of the poly- 

r 2 

normal, however, is n which very expensive for even very 
small networks. Due to this fact, most graph partitioning has 
been based on spectral methods (more below). 

Newman has argued JH 0, [Till that R is not the right quan- 
tity to minimize in the context of complex networks. There 
are several reasons for this: First of all, the notion of cut-size 
does not capture the essence of our 'definition' of network as a 
tendency for nodes to divide into modules with dense connec- 
tions between the members of module and sparser connections 
between modules. According to Newman, a good division is 
not necessarily one, in which there are few edges between the 
modules, it is one where there are fewer edges than expected. 
There are other problems with R: If we set the community 
sizes free, minimizing R will tend to favor small communi- 
ties, thus the use of R forces us to decide on and set the sizes 
of the communities in advance. 

As a solution to these problems, Girvan and Newman pro- 
pose the modularity Q of a network [1], defined as 

Q = ^%[ A ij- p ijWuCj). (3) 
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The Pij, here, are a null model, designed to encapsulate the 
'more edges than expected' part of the intuitive network def- 
inition. It denotes the probability that a link exists between 
node i and j. Thus, if we know nothing about the graph, an ob- 
vious choice would be to set Pn = p, where p is some constant 
probability. However, we know that the degree distributions of 
real networks are often far from random, therefore the choice 
of Pjj ~ kikj is sensible; this model implies that the proba- 
bility of a link existing between two nodes is proportional to 
the degree of the two nodes in question. We will make ex- 
clusive use of this null model in the following; the properly 
normalized version is Pij — (kikj) / (2m). It is axiomatically 
demanded that that Q = when all nodes are placed in one 
single community. This constrains the Pij such that 



Y^Pij = 2m, 



(4) 



we also note that P = (P) 7 ", which follows from the symmetry 
ofA. 

Comparing Eqs. (O and ([3J, we notice that there are two 
differences between Q and R. The first is that Q implies 
that we maximize the number of intra-community links in- 
stead of minimizing the the number of inter-community links 
as is the case for R — this is the difference between multiply- 
ing by 8(c,-,c ; ) and [1 — 8(c;,c,)]. The second difference lies 
in the the introduction of the Pij in Equation ([3]). The subtrac- 
tion of P^ serves to incorporate information about the inter- 
community links into the quantity we are optimizing. 

Use of modularity to identify network communities is 
not, however, completely unproblematic. Criticism has been 
raised by Fortunato and Barthelemy 111 211 who point out that 
the Q measure has a resolution limit. This stems from the fact 
that the null model ~ kikj can be misleading. In a large 
network, the expected number of links between two small 
modules is small and thus, a single link between two such 
modules is enough to join them into a single community. A 
variation of the same criticism has been raised by Rosvall and 
Bergstrom [13]. These authors point out that the normaliza- 
tion of P^ by the total number of links m has the effect that if 
one adds a distinct (not connected to the remaining network) 
module to the network being analyzed and partition the whole 
network again allowing for an additional module, the division 
of the original modules can shift substantially due to the in- 
crease of m. 

In spite of these problems, the modularity is a highly inter- 
esting method for detecting communities in complex networks 
when we keep in mind the limitations pointed out above. 
What makes the modularity particularly interesting compared 
to other clustering methods is its ability to inform us of the 
optimal number of communities for a given network 2 . 



2 This ability to estimate the number of communities, however, stems from 
the introduction of the Pjj term in the Eq. {3j and is therefore directly linked 
to the conceptual problems with Q mentioned in the previous paragraph. 



III. SPECTRAL OPTIMIZATION OF MODULARITY 

The question of finding the optimal Q is a discrete opti- 
mization problem. We can estimate the size of the space we 
must search to find the maximum. The number of ways to di- 
vide n vertices into C non-empty sets (communities) is given 
by the Stirling number of the second kind S„ lfl4ll . Since we 
do not know the number of communities that will maximize 
Q before we begin dividing the network, we need to examine 

a total of Y!c=2 $n community divisions [15]. Even for small 
networks, this is an enormous space, which renders exhaustive 
search out of the question. 

Motivated by the success of spectral methods in graph par- 
titioning, Newman suggests a spectral optimization of Q llllll . 
We define a matrix, called the modularity matrix B = A P 
and an (n x C) community matrix S. Each column of S corre- 
sponds to a community of the graph and each row corresponds 
to a node, such that the elements 



Sic = 



1 , if node i belongs to community c; 
otherwise. 



(5) 



Since each node can only belong to one community, the 
columns of S are orthogonal and Tr(S r S) = n. The 8-symbol 
in Equation (01 can be expressed as 



8(cj,Cj) = Y, S ikSjk, 
k=l 

which allows us to express the modularity compactly as 



y=i*=i 



Tr(S r BS) 

2m 



(6) 



(7) 



This is the quantity that we wish to maximize. 

The next step is the 'spectral relaxation', where we relax the 
discreteness constraints on S, allowing elements of this matrix 
to possess real values. We do, however, constrain the length of 
the column vectors by S T S — M, where M is a C x C matrix 
with the number of nodes in each community n\,ri2,---,nc 
along the diagonal. In order to determine the maximum, we 
take 



as 



i 

2m 



Tr[S y BS] + Tr[(S' S - M)A] = 



(8) 



where A is a C x C diagonal matrix of Lagrange multipliers. 
The maximum is given by 



BS = SA, 



(9) 



where A = —2mA for cosmetical reasons. Eq. (O is a standard 
matrix eigenvalue problem. Optimizing in the relaxed repre- 
sentation, we substitute this solution into Eq. (O, and see that 
in order to maximize Q, we must choose the C largest eigen- 
values of B and their corresponding eigenvectors. Since all 
rows and columns of B sum to zero by definition, the vector 
(1,1, , 1) is always an eigenvector of B with the eigen- 
value 0. In general the modularity matrix can have both posi- 
tive and negative eigenvalues. It is clear from Eq. ^} that the 
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eigenvectors corresponding to negative eigenvalues can never 
yield a positive contribution to the modularity. Thus, the num- 
ber of positive eigenvalues presents an upper bound on the 
number of possible communities. 

However, we need to convert our problem back to a discrete 
one. This is a non-trivial task. There is no standard way to go 
from the n continuous entries in each of the C largest eigenvec- 
tors of the modularity matrix and back to discrete 0, 1 values 
of the community matrix S. One simple way of circumventing 
this problem is to use repeated bisection of the network. This 
is the procedure that Newman 111 111 recommends. In New- 
man's scheme, the only eigenvector utilized is the eigenvector 
corresponding to the largest eigenvalue b max of B (with high- 
est contribution to Q). The 0, 1 vector most parallel to this 
continuous eigenvector, is one where the positive elements of 
the eigenvector are set to one and the negative elements zero. 
This is the first column of the community matrix S. The sec- 
ond column must contain the remaining elements. 

We can increase the modularity iteratively by bisecting the 
network into smaller and smaller pieces. However, this re- 
peated bisection of the network is problematic. There is no 
guarantee that that the best division into three groups can be 
arrived at by finding by first determine the best division into 
two and then dividing one of those two again. It is straight 
forward to construct examples where a sub-optimal division 
into communities is obtained when using bisection iffl flfill . 

Spectral optimization is not perfect — especially when only 
the eigenvector corresponding to b max is employed 3 . There- 
fore, Newman suggests that it should only be used as a start- 
ing point. In order to improve the modularity, Newman has 
devised an algorithm inspired by the classical Kernighan-Lin 
(KL) scheme II 1 711 . The procedure is as follows: After each 
bisection of the network we go through the nodes and find the 
one that yields the highest increase in the modularity of the 
entire network (or smallest decrease if no increase is possi- 
ble) if moved to the other module. This node is now moved 
to the other module and becomes inactive. The next step is to 
go through the remaining n — 1 nodes and perform the same 
action. We continue like this until all nodes have been moved. 
Finally, we go through all the intermediate states and pick the 
one with the highest value of Q. This is the new starting divi- 
sion. We proceed iteratively from this configuration until no 
further improvement can be found. Let us call this optimiza- 
tion the 'KLN-algorithm'. 

In the spectral optimization, the computational bottleneck 
is the calculation of the leading eigenvector(s) of B, which is 
non-sparse. Naively, we would expect this to scale like <9(n 3 ). 
However, B's structure allows for a faster calculation. We can 



write the product of B and a vector v fTTll as 



3 Newman has proposed a scheme that utilizes two eigenvectors of the mod- 
ularity matrix corresponding to the two highest eigenvalues [7] that — 
according to our experiments — performs slightly better than the single 
eigenvector method described above. However, after the application of 
the KLN-algorithm described in this section, we found no difference in the 
results found by using one or two eigenvectors. 



Bv = Av 



k(k r v) 
2?n 



(10) 



This way we have a divided the multiplication into (i) sparse 
matrix product with the adjacency matrix that takes 0{m+n), 
and (ii) the inner product k T \ that takes 0(n). Thus the entire 
product Bv scales like 0(m+n). The total running for a bisec- 
tion determining the eigenvector(s) is therefore 0((m + n)n) 
rather than the naive guess of O (n 3 ). Using Eq. (fTOb during the 
KLN-algorithm reduces the cost of this step to 0((m + n)n) 

CL1. 



IV. MEAN FIELD OPTIMIZATION 

Simulated annealing was proposed by Kirkpatrick et 
al. |[l"8ll who noted the conceptual similarity between global 
optimization and finding the ground state of a physical system. 
Formally, simulated annealing maps the global optimization 
problem onto a physical system by identifying the cost func- 
tion with the energy function and by considering this system 
to be in equilibrium with a heat bath of a given temperature 
T. By annealing, i.e., slowly lowering the temperature of the 
heat bath, the probability of the ground state of the physical 
system grows towards unity. This is contingent on whether 
or not the temperature can be decreased slowly enough such 
that the system stays in equilibrium, i.e., that the probability 
is Gibbsian 

(ii) 

Here, Z is a constant ensuring proper normalization. Kirk- 
patrick et al. realized the annealing process by Monte Carlo 
sampling. The representation of the constrained modularity 
optimization problem is equivalent to a C-state Potts model. 
Gibbs sampling for the Potts model with the modularity Q as 
energy function has been investigated by Reichardt and Born- 
holdt, see e.g., lfl6ll . 

Mean field annealing is a deterministic alternative to Monte 
Carlo sampling for combinatorial optimization and has been 
pioneered by Peterson et al. fljjj 12011 . Mean field anneal- 
ing avoids extensive stochastic simulation and equilibration, 
which makes the method particularly well suited for optimiza- 
tion. There is a close connection between Gibbs sampling and 
MF annealing. In Gibbs sampling, every variable is updated 
by random draw of a Potts state with a conditional distribu- 
tion, 



P(S n ,...,S iC \S { ^ } ,T) 



P(S\T) 



Ls s, c P(S\T) 



(12) 



where the sum runs over the C values of the i'th Potts variable 
and S{_,j denotes the set of Potts variables excluding the i'th 
node. As noted by lfl6ll . Eq. (fT2l is local in the sense that the 
part of the energy function containing variables not connected 
with the i'th cancels out in the fraction. The mean field ap- 
proximation is obtained by computing the conditional mean 



4 



of the set of variables coding for the fth Potts variable using 
Eq. ( TTZl i and approximating the Potts variables in the condi- 
tional probability by their means lEoll . This leads to a simple 
self-consistent set of non-linear equations for the means, 



Hik : 



exp(fa/r) 

E£ =1 exp(<t>#/r) : 



§ik=Y, B ij! J jk- (13) 



For symmetric connectivity matrices with Y,j Bij — 0, the set 
of mean field equations has the unique high-temperature so- 
lution = l/C. This solution becomes unstable at the mean 
field critical temperature, T c = b max /C, determined by the 
maximal eigenvalue b max ofB. 

This mean field algorithm is fast. Each synchronous itera- 
tion (see Section|Vl]for details on implementation) requires a 
multiplication of B by the mean vector p. As we have seen, 
this operation can be performed in 0(m + n) time using the 
trick in Eq. ( TTOb . In these experiments, we have used a fixed 
number of iterations of the order of 0(n), which gives us a to- 
tal of 0((m + n)n) similar to the case of by spectral optimiza- 
tion. (A forthcoming paper discusses the relationship between 
Gibbs sampling, mean field methods, and computational com- 
plexity.) 



V. A SIMPLE NETWORK 

We will perform our numerical experiments on a simple 
model of networks with communities. This model network 
consists of C communities with n c nodes in each, the total 
network has n = n c C nodes. Without loss of generality, we 
can arrange our nodes according to their community; a sketch 
of this type of network is displayed in FigureQ] Communities 



p 


1 


1 


p 



given by < q < p. The networks are unweighted and undi- 
rected. 

Let us calculate Q for this network in the case where p = 1 
and q = 0. In this case, we can calculate everything exactly. 
First, we note that all nodes have the same number of links, 
and that the degree of node i, ki = n c — 1 (since a node does 
not link to itself). Thus the total number of links m c in each 
sub-network is 



1 



n c (n c - 1), 



(14) 



and since our network consists of C identical communities the 
total number of links is m = Cm c . We can now write down the 
contribution Q c from each sub-network to the total modularity 



Qc = =-£(Ay-i'y)S(c,c) 



2m 
1 

2m 



n c {n c - 1) - n 2 c ^' 



\2l 



2m 



If we insert m and use that Q = CQ C , we find 

1 



Q = CQ C = 1 



C 



(15) 
(16) 

(17) 



We see explicitly that when C — > °° the modularity approaches 
unity. 

Now, let us examine at the general case. Since our network 
is connected at random, we cannot calculate the number of 
links per node exactly, but we know that the network is well- 
behaved (Poisson link distribution), thus we can calculate the 
average number of links per node. We see that 



k = (n c - \)p + n c {C- l)q, 



(18) 



which is equal to the number of expected intra-community 
links plus the number of expected number of inter-community 
links. The number of links in the entire network is therefore 
given by 

1 Cn 

m = -Cn c k = —^[(n c - \)p + n c {C-\)q]. (19) 



FIG. 1: A sketch of the simple network model. The figure displays 
the structure of the adjacency matrix with nodes arranged according 
to community. Inside each community (the blocks) along the diag- 
onal, the probability of a link between two nodes is p and between 
communities, the probability of a link is q. 

are defined as standard random networks, where the probabil- 
ity of a link between two nodes is given by p, with < p < 1. 
Between the communities the probability of a link between is 



We write down Q 



^ 2m 



n c (n c -l)p-n c 
(n c -l)p 



2 {{n c -\)p + n c {C-\)q} 2 



2m 



1 



(n c — l)p + n c (C- l)q C 
When n c 1 (which is always the case), we have that 



Q 



p + q{C-\) 



1 

C' 



(20) 



(21) 



When we write q as some fraction / of p, that is q = fp, with 
0< /< l,we find 



Q{cj) 



1 



1 



1 + (C-1)/ c 



(22) 
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FIG. 2: Equation d22t and <2desion- This figure displays Q as a func- 
tion of / (the relative probability of a link between communities), 
with C — 5 for the simple network defined in Figure[TJ The blue line 
is given by Eq. \22i and the black dots with error-bars are mean val- 
ues of Gdesign in realizations of the simple network with p = 1/10 
and n = 500; each data-point is the mean of 100 realizations. The 
error bars are calculated as the standard deviation divided by square 
root of the number of runs. 



which is independent of p. Thus, for this simple network, 
the only two relevant parameters are the number of commu- 
nities and the density of the inter-community links relative to 
the intra-community strength. We can also see that our result 
from Eq. (T% is valid even in the case p < 1, as long as the 
communities are connected and q = 0. 

If we design an adjacency matrix according to Figure[T| we 
can calculate the value ^design = Tr(SjBSd)/(2m), where Sd 
is a community-matrix that reflects the designed communities. 
Values of Qdesign should correspond to Eq. (|22| >. We see in 
Figure [2] that this expectation is indeed fulfilled. The blue 
curve is Q as a function of / with C = 5. The black dots 
with error-bars are mean values of Qdesign m realizations of 
the simple network with p = 1/10 and n = 500; each data- 
point is the mean of 100 realizations and the error bars are 
calculated as the standard deviation divided by square root of 
the number of runs. The correspondence between prediction 
and experiment is quite compelling. 

We should note, however, that the value of Qdesign may 
be lower than the actual modularity found for the network 
by a good algorithm: We can imagine that fluctuations of 
the inter-community links could result in configurations that 
would yield higher values of Q — especially for high values of 
/. We can quantify this quite precisely. Reichardt and Born- 
holdt [16] have shown that demonstrated that random net- 
works can display significantly larger values of Q due to fluc- 
tuations; when / = 1, our simple network is precisely a ran- 
dom network (see also related work by Guimera et al. n2M ). 
In the case of the network we are experimenting on, (n = 500, 
p = 1/10), they predict Q w 0.13. 

Thus, we expect that the curve for Q( f, C) with fixed C will 
be deviate from the Qdesign displayed in Figure |2j especially 
for values of / that are close to unity. The line will decrease 
monotonically from Q(0, C) = l — l/C towards Q(l,C) = 
0.1 1 with the difference becoming maximal as / — > 1. 



VI. NUMERICAL EXPERIMENTS 

We know that the running time of mean field method scales 
like that of the spectral solution. In order to compare the pre- 
cision of the mean field solutions to the solutions stemming 
from spectral optimization, we have created a number of test 
networks with adjacency matrices designed according to Fig- 
ure Q] We have created 100 test networks using parameters 
n c = 100, C = 5, p = 0.1 and / e [0, 1]. Varying / over this 
interval allows us to interpolate between a model with C dis- 
junct communities and a random network with no community 
structure. 

We applied the following three algorithms to our test net- 
works 

1 . Spectral optimization, 

2. Spectral optimization and the KLN-algorithm, and 

3. Mean field optimization. 

Spectral optimization and the KLN-algorithm were imple- 
mented as prescribed in ifTTll . The nC non-linear mean field 
annealing equations were solved approximately using a D — 
300-step annealing schedule linear in |3 = 1/T starting at |3 e 
and ending in 3p\- at which temperature the majority of the 
mean field variables are saturated. The mean field critical 
temperature T c = b max /C is determined for each connectivity 
matrix. The synchronous update scheme defined as parallel 
update of all means at each of the D temperatures 

(d+i) = expQlf /T) 

ik - Eg =1 «P(#/r) 

♦!? - (23) 

i 

can grow unstable at low temperatures. A slightly more ef- 
fective and stable update scheme is obtained by selecting ran- 
dom fractions p < 1 of the means for update in 1 /p steps at 
each temperature. We use p = 0.2 in the experiments reported 
below. A final T = iteration, equivalent to making a deci- 
sion on the node community assignment, completes the pro- 
cedure. We do not assume that actual the number of commu- 
nities C < C max is known in advance. In these experiments we 
use Cmax — 8. This number is determined after convergence 
by counting the number of non-empty communities 

The results of the numerical runs are displayed in Fig- 
ure [3] This figure shows the point- wise differences between 
the value of ^algorithm found by the algorithm in question and 
Qdesign plotted as a function of the inter-community noise /. 
The line of ^algorithm — 6design = thus corresponds to the 
curve plotted in Figure[2] We see from Figure[3]that the mean 
field approach uniformly out-performs both spectral optimiza- 
tion and spectral optimization with KLN post-processing. We 
also ran a Gibbs sampler [ 16] for with a computational com- 
plexity equivalent to the mean field approach. This lead to 
communities with Q slightly lower than the mean field re- 
sults, but still better than spectral optimization with KLN post- 
processing. 
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FIG. 3: Comparing spectral methods with the mean field solution. 
The networks were created according to the simple model, using pa- 
rameters n c = 100, C = 5, p = 0. 1 and / £ [0, 1] . All data points dis- 
play the point-wise differences between the value of <2algoiithm found 
by the algorithm in question and <2design- The error-bars are calcu- 
lated as in Figure[2] The dash-dotted red line shows the results for the 
spectral method. The dashed blue line shows the results for the spec- 
tral optimization followed by KLN post-processing. The solid black 
curve shows the results for the mean field optimization. The grey, 
horizontal line corresponds to the theoretical prediction (Eq. i22\ ) 
for the designed communities. 
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FIG. 4: The median number of communities found by the various 
algorithms. The panel shows the median number of communities as 
a function of the relative fraction of inter-community links /. All 
optimization schemes consistently pick four or five communities for 
the highest values of /. This finding is consistent with theoretical 
and experimental results by Reichardt and Bornholdt [ 16] 



We note that the obtained ^algorithm for a random network 
(/ = 1) is consistent with the prediction made by Reichardt 
and Bornholdt fijj] . We also see that the optimization algo- 
rithms can exploit random connections to find higher values of 
^algorithm than expected for the designed communities Qdesign- 
In the case of the mean field algorithm this effect is visible for 
values of / as low as 0.2. 

Figure [4] shows the median number of communities found 
by the various algorithms as a function of /. It is evident 
from Figs.[3]and|4]that — for this particular set of parameters — 
the problem of detecting the designed community structure is 
especially difficult around / = 0.3. Spectral clustering with 
and without the KLN algorithm find values ^algorithm that are 
significantly lower than Qdesigm- The mean field algorithm 
manages to find a value of ^algorithm that is higher than the 
designed Q but does so by creating extra communities. As 
/ — * 1 it becomes more and more difficult to recover the de- 
signed number of communities. 



VII. CONCLUSIONS 

We have introduced a deterministic mean field annealing 
approach to optimization of modularity Q. We have evalu- 
ated the performance of the new algorithm within a family 
of networks with variable levels of inter-community links, /. 
Even with a rather costly post-processing approach, the spec- 
tral clustering approach suggested by Newman is consistently 
out-performed by the mean field approach for higher noise 
levels. Spectral clustering without the KLN post-processing 
finds much lower values of Q for all / > 0. 

Speed is not the only benefit of the mean field approach. 
Another advantage is that the implementation of mean field 
annealing is rather simple and similar to Gibbs sampling. This 
method also avoids the inherent problems of repeated bisec- 
tion. The deterministic annealing scheme is directed towards 
locating optimal configurations without wasting time at care- 
ful thermal equilibration at higher temperatures. As we have 
noted above, the modularity measure Q may need modifica- 
tion in specific non-generic networks. In that case, we note 
that the mean field method is quite general and can be gener- 
alized to many other measures. 
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